[mlir] Add Normalize pass #162266

Sh0g0-1758 · 2025-10-07T11:45:47Z

This PR aims to add a pass that reduces the diff noise between two roughly semantically similar MLIR modules. We do this in a 2 step process. The first is instruction reordering. Now two semanitcally similar modules should have roughly the same type of side-effects and the same structure of control flow graph. So we collect operations (termed as output, which include operations with an IsTerminator trait, operations with a MemoryEffects::Write side effect, and call operation) and walk them top-down recursively so that for each operand, we bring their definition as close as possible to the using operation. The second is deterministic SSA naming. Simple linear naming won't work because the addition of any new operation in two roughly semantically similar modules would change the diff a lot. The naming scheme we have chosen derives from the one introduced in llvm-canon (refer to this talk for further details). There is one aspect of it that might need some discussion on. While naming initial operations, we collect outputFootprint as the distance of output operations from the beginning of the function. But this means that the addition of any redundant operation will change the name of all the inital operations which would pollute the diff a lot. Instead, perhaps we can simply add the number of output operations using that initial operation in the hash ?

Patch made in collaboration with @anant37289

github-actions · 2025-10-07T11:49:15Z

✅ With the latest revision this PR passed the C/C++ code formatter.

jpienaar

Could you expand the PR desciption as well as pass and function comments to capture the goals, what's working etc?

mlir/include/mlir/Conversion/Passes.td

mlir/lib/Conversion/Normalize/Normalize.cpp

jpienaar · 2025-10-16T08:35:58Z

mlir/lib/Conversion/Normalize/Normalize.cpp

+
+  uint64_t Hash = MagicHashConstant;
+
+  uint64_t opcodeHash = strHash(op->getName().getStringRef().str());


StringRef has hash_value , could that be used?

I don't think so. iirc, set_fixed_execution_hash_seed has been removed in the latest tree which means llvm::hash_value currently gives different hash in subsequent runs (unless you rebuild llvm with certain flags). I believe the user would want the SSA names to be consistent throughout runs.

mlir/lib/Conversion/Normalize/Normalize.cpp

jpienaar · 2025-10-16T08:38:59Z

mlir/lib/Conversion/Normalize/Normalize.cpp

+  if (op.hasTrait<OpTrait::IsTerminator>())
+    return true;
+
+  if (auto memOp = dyn_cast<MemoryEffectOpInterface>(&op)) {


Could you comment as to why?

Can you check the updated description once and let me know if its still not clear ?

The comment should be in the code too, not just description. But no it doesn't really explain why. It just says it is done. IRNormalizer.cpp makes it clearer as not to reorder around side-effecting operations.

jpienaar · 2025-10-16T08:39:52Z

mlir/lib/Conversion/Normalize/Normalize.cpp

+    Operands.push_back({Stream.str(), operand});
+  }
+
+  if (op->hasTrait<OpTrait::IsCommutative>()) {


commutative operations are naturally reordered while considering constants, would this be considered here too giving naming conventions?

the constants would have been hashed during the nameAsInitialOperation call, so if the operation is commutative, the constant operands would get reordered accordingly.

mlir/lib/Conversion/Normalize/Normalize.cpp

matthias-springer · 2025-10-16T10:01:50Z

mlir/test/Conversion/Normalize/infinite-loop.mlir

+
+// CHECK-LABEL:   module {
+// CHECK:           func.func @infinte_loop(%[[ARG0:.*]]: memref<?xi32>, %[[ARG1:.*]]: i32) {
+// CHECK:           %vl15969$e5677$ = arith.constant 1 : i32


What's up with these cryptic SSA variable names?

That's the point of llvm-canon, it gives everything predictable "canonical" names so when you compare two files textually, they don't differ because of SSA value numbers.

you can refer to this talk to further understand the motivation behind llvm-canon.

Yeah, its useful for that but indeed terrible to read :) (I almost wonder if the diff tool post would be smart enough to reduce it to something smaller ... or perhaps give both files in to diffing so that names could be chosen with more care - e.g., take all names across both, rename across both s/%vl15390$funcArg1-vl15969$/%0/ etc)

mlir/lib/Conversion/Normalize/Normalize.cpp

matthias-springer · 2025-10-16T10:04:01Z

mlir/lib/Conversion/Normalize/Normalize.cpp

+private:
+  const uint64_t MagicHashConstant = 0x6acaa36bef8325c5ULL;
+  void
+  collectOutputOperations(Block &block,


All of the functions here should have documentation.

I think you may have forgotten to upload the change.

I also like how LLVM's pass grouped these (llvm/lib/Transforms/Utils/IRNormalizer.cpp)

I have added the documentation above the definition of each function. Though I left 2-3 functions whose definitions were short or their functionality was intuitive from the name.

ftynse

Thanks for your patch! Please provide a clear commit description, explaining why you make changes, not only what they are: https://mlir.llvm.org/getting_started/Contributing/#commit-messages.

Two large items that will need addressing is (1) this does not belong to lib/Conversion because it is not a conversion and it will have to operate generally on all MLIR operations regardless of dialects and (2) we need to find a way of testing the desired goal of the tool which is minizing textual difference between files to "semantically meaningful" parts.

anant37289 · 2025-10-21T13:35:00Z

(1) this does not belong to lib/Conversion because it is not a conversion and it will have to operate generally on all MLIR operations regardless of dialects

Should we make a separate tool out of it or place it in some other directory like rewrite..?

(2) we need to find a way of testing the desired goal of the tool which is minizing textual difference between files to "semantically meaningful" parts.

We can try making 2 forms of the same initial code by applying 1-2 different optimizations and expect small diffs because of them being semantically similar as mentioned here for mir canon

jpienaar

I don't see any test with side-effecting ops

mlir/include/mlir/Conversion/Passes.td

jpienaar · 2025-10-24T11:13:09Z

mlir/include/mlir/Conversion/Passes.td

+//===----------------------------------------------------------------------===//
+// Normalize
+//===----------------------------------------------------------------------===//
+def Normalize : Pass<"normalize", "ModuleOp"> {


Is it required to be a ModuleOp pass? Does it need to be at symbol table scope?

Well we do need the top-level module operation to collect all the output operations and thus reorder/rename regular/initial operations if they have not been visited earlier by another output operation using them.

I don't quite follow this. I may be missing what you are saying as I think this is true and could work even for functions/doesn't even need to be on top-level ops. But it does need to (correct me if I'm wrong) run sequentially due to rng update ordering.

mlir/lib/Conversion/Normalize/Normalize.cpp

jpienaar · 2025-10-24T11:35:43Z

mlir/lib/Conversion/Normalize/Normalize.cpp

+    visited.insert(op);
+
+    if (isOutput(*op)) {
+      func::FuncOp func = op->getParentOfType<func::FuncOp>();


You can avoid making this on FuncOp and just query the Block in which this operation is.

But we need the enclosing function to get different footprint for output operations at different depth. If we measure the footprint from block, then two output operations at same depth from their enclosing block would have same footprint.

That's true for a FuncOp too, but what is indeed different is that FuncOp is IsolatedFromAbove. Would that be sufficient?

My goal here is to not hardcode an assumption of 1 top level module with N functions underneath. But enable it to run even where there is another top level op and other region ops.

mlir/lib/Conversion/Normalize/Normalize.cpp

jpienaar · 2025-10-24T11:39:48Z

mlir/lib/Conversion/Normalize/Normalize.cpp

+/// Helper method returning indices (distance from the beginning of the basic
+/// block) of output operations using the given operation. Walks down the
+/// def-use tree recursively
+llvm::SetVector<int> NormalizePass::getOutputFootprint(


How is this used below? Seems to be for hashing, but not sure I followed the logic.

the outputfootprint is used to differentiate between initial operations that have the same def but used by different output operations.

jpienaar · 2025-10-24T11:46:03Z

mlir/test/Conversion/Normalize/infinite-loop.mlir

+
+// CHECK-LABEL:   module {
+// CHECK:           func.func @infinte_loop(%[[ARG0:.*]]: memref<?xi32>, %[[ARG1:.*]]: i32) {
+// CHECK:           %vl15969$e5677$ = arith.constant 1 : i32


Yeah, its useful for that but indeed terrible to read :) (I almost wonder if the diff tool post would be smart enough to reduce it to something smaller ... or perhaps give both files in to diffing so that names could be chosen with more care - e.g., take all names across both, rename across both s/%vl15390$funcArg1-vl15969$/%0/ etc)

jpienaar · 2025-10-24T11:48:56Z

mlir/test/Conversion/Normalize/infinite-loop.mlir

+    %tmp41 = arith.xori %tmp40, %cneg1 : i32
+    %tmp42 = arith.addi %tmp39, %tmp41 : i32
+    %tmp43 = arith.addi %tmp42, %c0 : i32
+    %tmp44 = arith.muli %tmp43, %tmp40 : i32


A lot of these seem like they can't be reordered due to creating a rather linear sequence here. How about doing a N node tree-link input structure here linearized, that way you could have both short and long term reorderings? (of course many other generated could work, goal is just to know there are many variants possible and given two variants see same output).

Well this test was largely inspired by a similar test in llvm-canon pr : https://github.com/llvm/llvm-project/pull/113780/files#diff-25aa6a3cb4c2af6c07edea9fa7b12a1afefa1337b2bb1802cb4658a3a2fb6353 .

But I have modified our infinite-loop test into 2 variants with tree-like dependencies that will produce the same canonical form after the normalize pass reorders them.

I'm not seeing it, I see

%tmp23 = arith.muli %tmp22, %tmp19 : i32 %tmp24 = arith.xori %tmp23, %cneg1 : i32 %tmp25 = arith.addi %tmp22, %tmp24 : i32 %tmp26 = arith.addi %tmp25, %c0 : i32 %tmp27 = arith.muli %tmp26, %tmp23 : i32 %tmp28 = arith.xori %tmp27, %cneg1 : i32 %tmp29 = arith.addi %tmp26, %tmp28 : i32 %tmp30 = arith.addi %tmp29, %c0 : i32 ...

and you have 24 depend on 23, 25 on 24. 26 on 25, 27 on 26, 28 on 27, 29 on 27, 30 on 29 - so here there is no reordering possible in this snippet.

… that will produce the same canonical form after the normalize pass reorders them

jpienaar · 2025-11-06T18:01:02Z

mlir/lib/Conversion/CMakeLists.txt

Just +1 to Alex's point that this isn't a Conversion in MLIR terminology (https://mlir.llvm.org/getting_started/Glossary/#conversion). Transforms could be better spot, although I was indeed also wondering if in Tools directory wouldn't be better as it is rather specific when one runs this.

jpienaar · 2025-11-06T18:17:31Z

mlir/test/Conversion/Normalize/infinite-loop.mlir

+    %tmp41 = arith.xori %tmp40, %cneg1 : i32
+    %tmp42 = arith.addi %tmp39, %tmp41 : i32
+    %tmp43 = arith.addi %tmp42, %c0 : i32
+    %tmp44 = arith.muli %tmp43, %tmp40 : i32


I'm not seeing it, I see

%tmp23 = arith.muli %tmp22, %tmp19 : i32 %tmp24 = arith.xori %tmp23, %cneg1 : i32 %tmp25 = arith.addi %tmp22, %tmp24 : i32 %tmp26 = arith.addi %tmp25, %c0 : i32 %tmp27 = arith.muli %tmp26, %tmp23 : i32 %tmp28 = arith.xori %tmp27, %cneg1 : i32 %tmp29 = arith.addi %tmp26, %tmp28 : i32 %tmp30 = arith.addi %tmp29, %c0 : i32 ...

and you have 24 depend on 23, 25 on 24. 26 on 25, 27 on 26, 28 on 27, 29 on 27, 30 on 29 - so here there is no reordering possible in this snippet.

jpienaar · 2025-11-06T18:17:55Z

mlir/test/Conversion/Normalize/infinite-loop.mlir

Could you rerun the input through mlir-opt? (the space just feels off here).

jpienaar · 2025-11-06T18:18:51Z

mlir/lib/Conversion/Normalize/Normalize.cpp

+        return true;
+  }
+
+  if (auto call = dyn_cast<func::CallOp>(op))


You've resolved the comment, but I still see this on func::CallOp

Sh0g0-1758 added 9 commits September 20, 2025 16:58

setup the normalize pass

3897342

reorder instructions

5aa301b

rename op results

dda4e0b

naming scheme

fe9a6fa

initial and regular operand renaming

1a8ef2f

rename without folding with hashing

48aa67b

operation folding

8a9efd1

nit

c94e6e6

Merge branch 'llvm:main' into mlir-canon

dab3957

Sh0g0-1758 added the mlir label Oct 7, 2025

Sh0g0-1758 self-assigned this Oct 7, 2025

Sh0g0-1758 added 3 commits October 7, 2025 17:49

operand reordering in alphabetical order

8c487ff

refactor impl

9c0e866

clang-format

0836dad

Sh0g0-1758 force-pushed the mlir-canon branch from d14a41f to 0836dad Compare October 10, 2025 01:58

Sh0g0-1758 added 10 commits October 10, 2025 07:36

linux build fix

ad15039

add reordering test

1256ff1

add infinite loop test and fix block/func arg naming

89218e9

rename constants when used in an initial instruction

f4d0c63

refactor repeated logic into foldOperations

9ed096d

clang-format

bbbfe21

fix infinite-loop test

22cdd91

nit

76df868

nit

8387f1c

test rename

1823295

Sh0g0-1758 marked this pull request as ready for review October 10, 2025 14:27

Sh0g0-1758 changed the title ~~[WIP][mlir] Add Normalize pass~~ [mlir] Add Normalize pass Oct 10, 2025

adding usedef chain test

7b590b0

Sh0g0-1758 requested a review from jpienaar October 10, 2025 14:41

jpienaar reviewed Oct 16, 2025

View reviewed changes

matthias-springer reviewed Oct 16, 2025

View reviewed changes

ftynse reviewed Oct 16, 2025

View reviewed changes

Sh0g0-1758 added 3 commits October 17, 2025 15:49

nit

9f86026

add docs

2313810

camelCase

57c0292

Sh0g0-1758 requested review from ftynse, jpienaar and matthias-springer October 21, 2025 13:39

jpienaar reviewed Oct 24, 2025

View reviewed changes

Sh0g0-1758 and others added 12 commits October 28, 2025 22:09

fix redundant includes and namespacing

9d8b682

nit

319bcae

nit

137f96c

nit

9049f11

nit

759e798

nit

4cfc8eb

early break

f663b14

use std::distance

195f88e

walk the module

ac6ce80

Add side-effect testcases

d137381

Merge branch 'main' into mlir-canon

c9a742b

modify infinite-loop test into 2 variants with tree-like dependencies…

bb41dc4

… that will produce the same canonical form after the normalize pass reorders them

Sh0g0-1758 requested a review from jpienaar October 29, 2025 11:01

jpienaar reviewed Nov 6, 2025

View reviewed changes


		uint64_t Hash = MagicHashConstant;

		uint64_t opcodeHash = strHash(op->getName().getStringRef().str());

[mlir] Add Normalize pass #162266

Are you sure you want to change the base?

[mlir] Add Normalize pass #162266

Uh oh!

Conversation

Sh0g0-1758 commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpienaar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ftynse left a comment

Choose a reason for hiding this comment

Uh oh!

anant37289 commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpienaar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Sh0g0-1758 commented Oct 7, 2025 •

edited

Loading

github-actions bot commented Oct 7, 2025 •

edited

Loading

anant37289 commented Oct 21, 2025 •

edited

Loading